Multidimensional term indexing for efficient processing of complex queries

نویسندگان

  • Michal Krátký
  • Tomás Skopal
  • Václav Snásel
چکیده

The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Concurrent Operations in Spatial Databases

As demanded by applications such as GIS, CAD, ecology analysis, and space research, efficient spatial data access methods have attracted much research. Especially, moving object management and continuous spatial queries are becoming highlighted in the spatial database area. However, most of the existing spatial query processing approaches were designed for single-user environments, which may no...

متن کامل

A DHT-Based System for the Management of Loosely Structured, Multidimensional Data

In this paper we present LinkedPeers, a DHT-based system designed for efficient distribution and processing of multidimensional, loosely structured data over a Peer-to-Peer overlay. Each dimension is further annotated with the use of concept hierarchies. The system design aims at incorporating two important features, namely large-scale support for partially-structured data and highperformance, ...

متن کامل

Efficient Computation of Data Cubes and Aggregate Views

This paper reviews the main techniques for the efficient calculation of aggregate multidimensional views and data cubes, possibly using specifically designed indexing structures. The efficient evaluation of aggregate multidimensional queries is obviously one of the most important aspects in data warehouses (OLAP systems). In particular, a fundamental requirement of such systems is the ability t...

متن کامل

CW2I: Community Data Indexing for Complex Query Processing

The increasing popularity of Community Web Management Systems (CWMSs) calls for tailor-made data management approaches for them. Still, existing CWMSs have mostly focused on simple similaritybased queries; they do not provide a framework for the efficient processing of more complex queries over community web data. In this paper, we propose a two-way indexing scheme that facilitates efficient an...

متن کامل

Optimization of Disk Accesses for Multidimensional Range Queries

Multidimensional data structures have become very popular in recent years. Their importance lies in efficient indexing of data, which have naturally multidimensional characteristics like navigation data, drawing specifications etc. The R-tree is a well-known structure based on the bounding of spatial near points by rectangles. Although efficient query processing of multidimensional data is requ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Kybernetika

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2004